Detecting Localized Repeats in Genomic Sequences: A New Strategy and Its Application to Bacillus Subtilis and Arabidopsis Thaliana Sequences
نویسندگان
چکیده
A new method for the search of local repeats in long DNA sequences, such as complete genomes, is presented. It detects a large variety of repeats varying in length from one to several hundred bases, which may contain many mutations. By mutations we mean substitutions, insertions or deletions of one or more bases. The method is based on counting occurrences of short words (3-12 bases) in sequence fragments called windows. A score is computed for each window, based on calculating exact word occurrence probabilities for all the words of a given length in the window. The probabilities are defined using a Bernoulli model (independent letters) for the sequence, using the actual letter frequencies from each window. A plot of the probabilities along the sequence for high-scoring windows facilitates the identification of the repeated patterns. We applied the method to the 1.87 Mb sequence of chromosome 4 of Arabidopsis thaliana and to the complete genome of Bacillus subtilis (4.2 Mb). The repeats that we found were classified according to their size, number of occurrences, distance between occurrences, and location with respect to genes. The method proves particularly useful in detecting long, inexact repeats that are local, but not necessarily tandem. The method is implemented as a C program called EXCEP, which is available on request from the authors.
منابع مشابه
Differential Expression of Arabidopsis thaliana Acid Phosphatases in Response to Abiotic Stresses
The objective of this research is to identify Arabidopsis thaliana genes encoding acid phosphatases induced by phosphate starvation. Multiple alignments of eukaryotic acid phosphatase amino acid sequences led to the classification of these proteins into four groups including purple acid phosphatases (PAPs). Specific primers were degenerated and designed based on conserved sequences of PAPs isol...
متن کاملCloning of the Gene Encoding M2e of Influenza Virus in B. subtilis
Background and Aims: The ectodomain of matrix protein of influenza virus is a weak immunogen that is highly conserved among all subtypes of influenza A virus. Tandem repeats of these genes along with linker were used to enhance immunogenicity of M2e protein and so it can be served as a universal vaccine in both humans and livestock. Materials and Methods: In this study, the sequences of extra-d...
متن کاملCharacterization of Arabidopsis thaliana telomeres isolated in yeast.
In an effort to learn more about the genomic organization of chromosomal termini in plants we employed a functional complementation strategy to isolate Arabidopsis thaliana telomeres in the yeast, Saccharomyces cerevisiae. Eight yeast episomes carrying A. thaliana telomeric sequences were obtained. The plant sequences carried on two episomes, YpAtT1 and YpAtT7, were characterized in detail. The...
متن کاملConstruction of chimeric protein 3M2e.FliC and its immunoinformatics analyses and expression in Bacillus subtilis
Introduction: Influenza A virus causes unpredictable epidemics and pandemics by creating antigenic variations. With the appearance of each new strain, rapid emergency countermeasures are taken against this new strain. Hence, designing an applicable and cross protective strategy to counter this virus is of great importance. To achieve this, choosing conserved antigenic regions in influenza virus...
متن کاملFORRepeats: detects repeats on entire chromosomes and between genomes
MOTIVATION As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers & chemistry
دوره 24 1 شماره
صفحات -
تاریخ انتشار 2000